[BLIP] Fix daily CI failing test#20877
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
That is a very big tolerance. It would be better to identify the layer in the model causing this problem.
|
Hmm at the beginning I thought that the |
|
|
||
| self.assertTrue(torch.allclose(torch.nn.Softmax()(out_itm[0].cpu()), expected_scores, atol=1e-3, rtol=1e-3)) | ||
| self.assertTrue(torch.allclose(out[0].cpu(), torch.Tensor([[0.5053]]), atol=1e-3, rtol=1e-3)) | ||
| self.assertTrue(torch.allclose(out_itm[0][0][0].cpu(), expected_scores)) |
There was a problem hiding this comment.
It would be great if we can figure out why the previous test logic failed between the environment.
Let me know if I could help here, @younesbelkada :-)
|
On GCP (my own/ CI runners), all torch versions give (torch 1.13.x) [[0.97982633 0.02017363]]
[[0.50528485]]or (torch 1.12.1) so [[0.9798, 0.0202]]
[[0.5053]]will work. Not sure why you got larger differ though, but it is likely an env issue. |
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
- add model.eval - fix tolerance for GPU devices
|
Thanks so much @ydshieh 💯 , the tests seem to pass now on the CI docker image with your suggested values! |
What does this PR do?
This PR fixes: https://github.com/huggingface/transformers/actions/runs/3754402958/jobs/6378634199
Why this fix is relevant?
The reference logits for this test were obtained under pytorch==1.13.1+cu116 and the daily CI uses pytorch==1.13.0+cu116. Setting the tolerance slightly higher (
4e-2) fixes the test to make it cross-versions compatible.cc @LysandreJik @sgugger @ydshieh